A0-A115  569  AIR  FORCE  INST  OF  TECH  HRIGHT-PATTERSON  AFB  OH  SCHOO— ETC  F/G  12 
A  NE*  METHOD  OF  SOLVING  I IL-CONO I T IONED  SYSTEMS  OF  EQUATIONS. (U> 
OEC  81  B  S  BIRMINGHAM  5  U 

UNCLASSIFIED  AF I T/GA/MA/81D-1  NL 


V 

,  J,- 

AFIT/GA/MA/  81D-1 


A  NEW  METHOD  OF  SOLVING  ILL-CONDITIONED 
SYSTEMS  OF  EQUATIONS 


THESIS 


AFIT/GA/MA/ 8 ID- 1 


Brian  S.  Birmingham 
2nd  Lt  USAF 


Approved  for  public  release;  distribution  unlimited 


AFIT/GA/MA/81D-  1 


A  NEW  METHOD  OF  SOLVING  ILL-CONDITIONED 
SYSTEMS  OF  EQUATIONS 


Presented  to  the  Faculty  of  the  School  of  Engineering 
of  the  Air  Force  Institute  of  Technology 
Air  University 

in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of 
Master  of  Science 


Accession  For 
NT  IS  GRAS:  I 
DTIC  IAB 
Unannounced 
Justification 


By. 

Distribution/  _  __ 

Availability  Codes 
Avail  and/or 

,  '.Dl'-t  !  Special 

by 

Brian  S.  Birmingham 
2nd  Lt  USAF 

Graduate  Astronautical  Engineering 
December  1981 

Approved  for  public  release;  distribution  unlimited 


□  □ 


Preface 


I  should  like  to  thank  my  thesis  advisor.  Dr.  W. 

S.  Ericksen,  for  his  guidance  and  unbounded  patience. 
When  I  strayed  from  the  Ordained  Path,  he  was  a  Shepherd, 
leading  me  back  to  the  Fold. 

Also,  I  extend  thanks  to  my  thesis  committee.  Dr. 
Quinn,  Capt  J.E.  Marsh,  and  Dr.  Paul  Nikolai. 

Brian  S.  Birmingham 


i  i 


TABLE  OF  CONTENTS 


Page 

Preface .  ii 

List  of  Tables .  iv 

Notations .  v 

Abstract .  vi 

I.  Introduction  .  1 

II.  Iterative  Improvement .  5 

III.  Masking .  7 

IV.  Methods .  8 

V.  Results .  10 

Given  x,  obtain  B,  and  solve  Ax  *  6  ...  .  10 

For  the  system  Ax=E,  given  A 

and  E,  solve  for  x .  14 

VI.  Recommendation .  23 

VII.  Conclusions .  24 

Bibliography  .  25 

Appendix  A:  Derivation  of  Iterative  Scheme  and 
Convergance  Condition  . 


26 


List  of  Tables 


Table  Page 

I  Effects  of  Iterative  Improvement .  5 

II  Effects  of  masking  using  a  given  x,  Case  1.  .  .  11 

III  Effects  of  masking  using  a  given  x,  Case  2.  .  .  12 

IV  Effects  of  masking  using  a  given  x.  Case  3.  .  .  13 

V  Effects  of  masking  using  a  given  x.  Case  4.  .  .  14 

VI  Residual  errors  for  a  given  b . 19 

VII  Residual  errors  with  masking  and  a  given  b.  .  .  20 

VIII  Residual  errors  with  masking  and 

multiplier  with  a  given  b . 22 

IX  Residual  errors  with  masking  and  multiplier 


with  a  given  6  (matrix  from  ABMAT  was  used).  .  23 


iv 


Notation 


A  -  the  matrix  of  coefficients  in  the  system  of 
equations,  Ax  =  b 

x  -  the  solution  vector  of  the  system,  Ax  =  b 

b  -  the  vector  of  constants  on  the  right-hand  side 

of  a  system  of  equation's 

A1  -  the  masked  version  of  A 

A2  -  the  matrix  obtained  when  A1  is  subtracted 
from  A 
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Abstract 

The  problems  associated  with  ill-conditioned 
matrices  are  well  known  and  widespread.  As  of  yet,  there 
are  no  general  solutions  to  the  problem.  The  only 
remedies  are  usually  ad  hoc  and  are  extremely 
case-dependent . 

This  paper  presents  a  method  which  gives  good 
results  for  a  large  variety  of  situations.  The  method 
involves  masking,  a  technique  which  sets  the  lower-order 
bits  (the  number  of  bits  varies)  to  zero,  and  then 
applying  standard  system-of -equat ions  solvers.  Most  of 
the  computer  runs  made  used  the  Hilbert  matrix,  and 
either  a  specified  b  vector  (as  in  the  form.  Ax  *  6),  or 
a  specified  x  vector,  with  the  b  obtained  by  multiplying 
x  by  A. 


A  NEW  METHOD  OF  SOLVING  ILL-CONDITIONED 
SYSTEMS  OF  EQUATIONS 
by 

Brian  S.  Birmingham 
2nd  Lt  USAF 

Dr.  W.  S.  Ericksen,  Advisor 

"I  have  often  admired  the  secret  magic  of  numbers” 

Sir  Thomas  Browne 

Ill-conditioned  matrices  are  a  constant  source  of 
irritation  for  programmers,  and  provide  much  frustration  for 
engineers  and  scientists  who,  even  though  they  have  modelled 
their  particular  physical  system  accurately,  have  no  way  of 
obtaining  reasonable  solutions  from  a  system-of-equat : on 
solver. 

The  problem  is  not  completely  understood,  yet  it  lies 
in  the  fact  that  small  perturbations  to  the  matrix  elements 
result  in  large  errors  in  the  solution.  Any  matrix 
exhibiting  this  property  is  termed  ill-conditioned. 

Part  of  the  problem  lies  with  the  floating-point 
number  system.  A  computer  word  is  a  finite-length  entity, 
and  difficulties  arise  in  trying  to  represent  an  infinite 
digit  number  with  a  finite  number  of  digits.  Many  constants 
are  irrational,  such  as  it  or  e,  but  even  rational  numbers 
present  problems,  such  as  2/3  or  4/7.  Even  with  a  large 
wordlength  machine,  the  inaccuracies  present  can  cause 
problems  for  all  but  the  most  well -conditioned  systems.  One 
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must  also  consider  numbers  that  are  finitely  represented 
in  decimal,  yet  require  an  infinite  number  of  digits  in 
other  number  bases.  Johnson  and  Riess  give  a  good 
description  of  this  [3].  For  example,  -Ijq  is 
represented  perfectly  in  base  10  with  a  finite  number  of 

digits.  Yet  in  base  16,  .1^  =  .19999 . and 

in  binary,  .Ijq  =  *001  1001  1001  1001  1001. 

Yet  another  problem  with  the  floating-point  number 
system  is  that  of  round  off.  There  are  three  main  causes 
of  computer  round-off  error  [7],  all  of  which  play  a 
prominent  role  in  matrix  computation: 

1.  The  addition  of  Large  Numbers  to  Small  Numbers 

If  a  particular  matrix  requires  the  addition 

of,  say,  3.8  x  1012  plus  1.56  x  10"13, 

1  2 

the  result  will  probably  be  3.8  x  10  , 

i.e.,  1.56  x  10  did  not  show  up. 

Granted,  the  error  involved  is  small,  yet  for 
large  systems  the  effects  may  build  up.  This 
is  call  accumulation  error.  One  solution  is 
to  simply  increase  the  wordlength,  either  by 
machine  design  or  by  the  use  of  double 
precision.  While  this  may  give  good  results, 
the  scheme  still  treats  the  symptoms  and  not 
the  cause. 

2.  Two  almost  Equal  Numbers  being  Subtracted 
For  example,  .123456789  -  .123456788  = 
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.000000001.  Before  the  operation  we  had  9 

digits  of  accuracy.  After  the  subtraction 

only  one  digit  of  accuracy  is  left.  Many 

times,  a  situation  like  this  will  give  an 

error  that  is  several  orders  of  magnitude 

bigger  than  the  actual  numbers. 

3.  Division  by  Small  Number 

Obviously,  a  division  by  0  will  result  in  an 

overflow.  However,  a  division  by  a  small 
- 1 4 

number,  say  10  ,  may  also  result  in  an 

overflow.  There  is  another  problem  in  that  a 
number  with  only  one  or  two  significant 
digits. 

The  error  in  the  floating-point  number  system  can  be 
determined.  If  x  is  a  real  number,  we  can  represent  it  as, 
x  =  +_  0.n1n2n3n4.  .  .  .n^  x  10e 
where  k  depends  on  s,  and  can  be  infinitely  large. 

However,  x  is  represented  as  an  approximation  in  a 

I 

finite  wordlength  machine  as  x  ,  where 

*  e 
x  =  +0 .n. n,n, . . .n  x  10 
1  ^  <3  s 

where  s  is  the  precision  of  the  machine. 

The  error  involved  is, 

I x  -  x' |<  h  X  10e's 

Blum  goes  into  this  more  rigorously  [1],  as  does  Knuth  [4]. 

With  the  problem  inherent  in  the  floating-point  number 
system,  one  might  be  tempted  to  give  up  completely  any 
attempts  at  system-of -equat ion  solving.  Unfortunately, 
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things  get  worse  before  they  get  (better ,  of  course,  is  a 
relative  term) . 

Another  aspect  should  be  discussed,  that  of  condition 
numbers.  Generally,  the  condition  number  of  a  matrix  is 
used  as  an  indicator  of  i 1 1 -condit ioning .  However,  in  order 
to  obtain  the  condition  number,  one  must  first  obtain  the 
norm.  The  two  most  common  vector  norms  are  ||x||^,  and 
| j  x|  |  t  called  the  one-norm  and  the  two-norm,  respectively. 

CD 

They  are  defined  as, 

11*11,-  I  l*il 

1  i=l  1 

11*11  =  max  |x-i 

00  i 

The  vector  norm  used  in  this  study  was  the  one-norm.  The 

two  matrix  norms  corresponding  to  the  above  are 

K 

I  |A|  L  =  max  Z  IxJ 
1  J  i=i  1 

which  is  simply  the  maximum  absolute  column  sum.  Also, 

K 

( ] A { 1  =  ra|x  ] Aij | (maximum  absolute  row  sum) 

H  00 

where  A  is  the  Hermitian  transpose  of  A.  This  paper  used 
the  matrix  one-norm.  The  condition  number  of  A,  denoted  as 
cond(A),  is  now  defined  as 

cond(A)  =  ( [A| |  ( { A'1 [ [ 

Immediately  one  can  see  that  the  condition  number  is  best 
used  as  an  a  posteriori  estimate,  since  the  condition  number 
involves  computing  the  inverse  of  A. 

It  must,  however,  be  kept  in  mind  that  a  condition 
number  is  not  an  absolute  measure  of  conditioning.  Many 
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matrices  can  exhibit  large  condition  numbers,  and  still  be 
we  1 1 -cond i t  ioned .  The  Hilbert  matrix,  on  the  other  hand, 
has  a  reasonable  condition  number  (for  the  lower  orders), 
yet  is  highly  ill-conditioned.  But  generally,  large 
condition  numbers  suggest  i 1 1 -cond it ioning .  There  are  many 
other  norms  and  condition  number,  explained  in  detail  in 
( 5 ] ,  and  [ 6 1  - 

II.  Iterative  Improvement 

While  the  typical  user  may  face  the  Scylla  of  word- 
length  constraints,  and  the  Charybdis  of  condition  number, 
he  has  a  powerful  tool  at  his  disposal,  that  of  iterative 
improvement . 

The  effects  of  iterative  improvement  are  shown  in  the 
table  below.  The  A  is  defined  on  page  9.  P  is  a  parameter 
used  in  defining  A. 

Table  I 

Effects  of  Iterative  Improvement 

P  =  0.5  P  =  0.5  -  10'5 

Exact  Floating-  Non-Exact  Floating- 

Point  Elements  Point  Elements 


Non-improved  Improved  Non-improved  Improved 


k 

cond (A) 

IIBI-Bll 

ll6ll 

1 i B2-B ! | 

TFTl 

1 Ibi-bI 1 

mm 

I  I B2 -B  i  1 

TT*TI 

2 

2.5 

+ 

03 

0.0 

0.0 

1.9 

-  13 

3.6  -  14 

4 

4.9 

+ 

08 

2.6  -  11 

0.0 

7.8 

-  11 

7.9  -  11 

6 

3.2 

+ 

13 

2.3  -  08 

0.0 

1.4 

-  07 

8.8  -  08 

8 

1.2 

+ 

18 

3.1  -  04 

0.0 

2.0 

-  04 

9.0  -  05 

10 

3.2 

+ 

22 

3.8  -  01 

0.0 

1.3 

-  01 

2.9  -  01 

5 


where 


N  =  16 

B1  =  A"1  from  LINV1F 
B2  =  A'1  from  LINV2F 

LINV1F  and  LINV2F  are  routines  in  the  IMSL  software  library. 

The  concept  involves  improvement  upon  the  obtained 
solution.  Generally,  the  improvement  is  focused  on  the 
residuals,  where 
r  =  Ax  -  b 

Hopefully,  r  =  0,  but  usually,  because  of  conditions 
previously  discussed,  there  will  be  error.  Typically  one 
would  proceed: 

Solve  Ax  =  b, 
compute  ?  =  Ax  -  b, 

_  I 

Solve  Ax  =  b  -  r , 

Solve  Ay  =  r, 
and  obtain, 

_  -  -  I 

y  =  x  -  x 

and  thus  obtaining  a  correction  for  x.  This  process  is 
repeated  until  ] |y| [<e,  where  e  is  a  user-defined  tolerance. 

Another  form  of  iterative  improvement  involves 
splitting,  whereby  A  is  split  into  two  matrices  such  that, 

A  =  M  +  N 

The  iterative  process  then  takes  the  form, 

Mxl  +  1  =  Nx1  +  6 
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and  the  initial  x  is  obtained  by  solving 
Mx^  =  b 


The  only  requirement  is  that  M1  exist. 


Maskim 


The  method  pursued  here  involved  masking  (a  variation 
of  splitting),  which  essentially  sets  a  number  of  lower 
order  bits  equal  to  zero.  The  number  of  bits  set  equal  to 
zero  varied. 

For  example,  if  one  wanted  to  use  a  mask  of  6  bits, 
i.e.,  set  the  last  6  bits  equal  to  zero,  the  mask  would  be 
defined  as 

MASK  *  77777 . 7700g 

and  then  each  element  of  the  matrix  would  be  put  through  the 
AND  function  with  MASK.  This  process  would  keep  the 
exponent  and  the  significant  portion  of  the  mantissa 
untouched,  while  setting  the  last  6  bits  equal  to  zero.  For 
coding  purposes,  simply  set  MASK  *  -77g  and  excute  the 
following  (CDC  convention): 

DO  10  I  =  1 , K 
DO  20  J  =  1  ,K 

A1(I,J)  =  A( I , J)  .AND.  MASK 
20  CONTINUE 
10  CONTINUE 

DO  30  I  =  1,K 
DO  40  J  =  1,K 
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A2( I ,  J)  =  A(I,J)  -  A1(I,JJ 
40  CONTINUE 


30  CONTINUE 

where  K  is  the  order  of  the  matrix,  Al  is  the  masked  version 
of  A  and 

A2  =  A  -  Al 

IV.  Methods 

The  methods  of  dealing  with  the  solution  of  Ax  =  b 
were  used  in  this  study.  The  first  involved  specifying  a 
solution  vector  x,  then  multiplying  by  A  to  obtain  b.  Then 
A  and  b  were  sent  to  a  system-of -equat ion  solver  (LEQT2F 
from  the  IMSL  package,  and  SGEIM  from  the  LINPACK  package!. 
For  a  control  run,  the  complete  (unmasked)  A  and  b  were  used 
and  the  results  noted.  Then  A  was  masked  and  b  computed, 
using  the  same  x.  Then  the  masked  A,  called  Al ,  and  the 

-  -  f 

resulting  b,  called  b  ,  were  sent  to  the  system-of- 
equation  solver. 

The  second  problem  involved  specifying  a  b,  and 
sending  A  and  b  to  the  solver.  A  form  of  iterative 
improvement  was  used  after  each  return  from  the  solver, 
namely , 

Alxi+1  «  6  -  A2X1 

and  the  initial  x  was  obtained  by  solving, 

Alx  =  6. 

The  derivation  and  covergence  conditions  are  contained  in 
Appendix  A. 
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For  the  first  case,  an  error  check  consisted  of 

I  lx  -  -  x  ,  .  .  ,  ! I ,  where  the  norm  was  the  sum  of  the 

1  1  given  obtained  > i 

absolute  values  of  each  element. 

For  the  second  case,  the  iteration  was  stopped  when 
||x1+1  -  x 1 J |  =  0.  For  final  error  estimates,  a  check 
on  the  residuals  was  used, 


error  =  | j  Ax  -  b I 

iFil 

When  b  was  specified,  there  was  no  iteration  on  the 
solution  when  a  mask  was  not  used,  since  A2  =  [0]. 

For  most  of  the  results,  the  Hilbert  matrix  was 
used.  The  Hilbert  is  very  easy  to  construct  and  has  the 
added  bonus  of  all  elements  being  integers.  The  other 
matrix  used  came  from  the  AFIT  subroutine,  ABMAT.  It  is 
constructed  using  the  following  algorithm: 


Au  '  p 


J  =  1 ,  K 


\  l  N  ^  _  N ! 

An  "lyi-y  u-i)i  (N-i+i)! 
(^0.  if  i-  1>N 


i  =  2,K  provided 
i-l<N 


A.  .  =  A.  .  .  +  A.  ,  •  i 

lj  l.J-l  i-l, J-l 


i  ,  j  «  2,K 

where  P  is  any  real,  non-zero  number  and  N  is  any  non¬ 
negative  integer.  As  in  the  Hilbert,  this  matris  is  easy  to 
obtain,  the  inverse  is  easy  to  obtain,  and  i 1 1 -cond it ioning 
is  easily  obtained  by  specifying  P  to  be  some  rational 
number  plus  a  small  perturbation,  such  as 
P  *  .5  +  10‘13 
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V.  Results 


All  of  the  results  generated  here  were  obtained  on 
the  CDC  Cyber  74,  which  has  a  60  bit  word.  The  Operating 
System  is  the  NOS/BE,  and  the  language  used  was  Fortran  4. 
The  default,  no  rounding,  was  used  in  all  computations. 

A;  Given  x,  obtain  6,  solve  Ax  =  6.  This  method 

used  a  given  x,  which  was  multiplied  by  A  to  obtain  b,  and 

then  A  and  b  were  sent  to  the  system-of -equat ion  solver. 

Although  this  in  not  a  "real  world"  example,  this  method  is 

useful  since  it  gives  an  absolute  check  on  the  solution 

obtained.  Residual  checking  is  not  needed  since  we  know 

what  the  solution  must  be.  All  runs  in  this  section  were 

made  with  LEQT2F,  and  the  Hilbert  matrix. 

For  a  control,  the  straight  (unmasked)  A  and 

corresponding  b  were  sent  to  LEQT2F.  For  the  masked  case,  A 

-  » 

was  masked  first ,  then  x  multiplied  by  A  to  obtain  b  ,  and 
then  A  and  b  were  sent  to  LEQT2F.  Note  the  b  v ill  differ 
from  b.  The  following  table  gives  the  lowest  error  for  a 
given  order,  with 

iT  -  [1.1.1 . 1.] 
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Table  II 

Effects  of  masking  using  a  given  x. 

Case  1 

•der 

Straight  Error 

Masked  Error 

5 

. 97053E-09 

. 14219E-13 

10 

.  19547E-01 

0.0 

15 

. 21316E-13 

20 

.71454E-14 

25 

.714S4E-14 

30 

0.0 

35 

0.0 

40 

. 21316E-13 

45 

.28422E-13 

50 

. 28422E-13 

Although  the  unmasked  system  failed  to  converge  after  K  = 

12,  the  masked  version  did  converge,  up  to  K  *  50,  and  the 
errors  are  remarkable.  Perhaps  a  table  of  condition  numbers 


would  help  to  emphasize: 


ORDER 

CONDITION  NUMBER 

5 

. 94366E+06 

10 

.  35357E  +  14 

15 

. 1 5393E+22 

20 

. 62836E+29 

25 

.  27745E+37 

30 

.  1 1 777E+45 

35 

.  51 856E+5 2 

40 

.  22451E+60 

45 

. 98823E+67 

50 

. 43303E+7  5 

Even  with  the  condition  number 

masked  solution  converged. 

present  for  K  =  50,  the 
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Next,  an  x  of 


x  = 


10.0 

1.0 

1.0 

1.0 


1.0 


was  used,  and  the  table  now  gives 


Table  III 


Effects  of  masking  using  a  given  x. 

Case  2 

Order 

Straight  Error 

Masked  Error 

5 

.  32923E-08 

.  71054E- 13 

10 

.  74683E-01 

0.0 

15 

.  28422E-13 

20 

0.0 

25 

.  7  8160E- 1 3 

30 

.49738E-13 

35 

.  63949E- 13 

40 

•78159E-15 

45 

.  92370E-13 

50 

.42635E-13 

As  before,  the  unmasked  solution  failed  to  converge  for 
orders  greater  than  |  |,  yet  the  masked  solution  converged  for 
all  orders  up  to  50. 
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Masked  Error 
. 11369E-12 
. 28422E-13 
. 39790E- 1 2 
. 56843E-12 
. 56843E- 1 2 . 
.79S81E-12 
. 90949E-12 
. 96634E- 1 2 
. 10800E- 1 1 
. 11937E-11 


was  used  and  the  results  are: 


Table  V 


Effects  of  masking  using  a  given  x,  Case  4 


Order 

Straight  Error 

Masked  Erro 

5 

.  28456E-09 

0.0 

10 

.43743E-01 

.  11368E-12 

15 

.  19895E- 1 2 

20 

•31974E-12 

25 

.  85265E-12 

30 

.  11013E-11 

35 

.  11795E-11 

40 

.  20037E-11 

45 

.22879E-11 

50 

.  31690E-11 

Once  again, 

these  results  are  remarkable  and 

have  been 

verified  by  W.  Ericksen. 


B.  For  the  system  Ax  =  b,  given  A  and  b,  solve  for 
x.  This  method  is  a  good  test  problem  because  the  typical 
user  will  have  at  his  disposal  the  A  matrix  and  the  b 
vector,  with  no  knowledge  of  x,  or  at  best,  an  idea  of  the 
range  of  values  in  x. 

For  the  first  part  of  this  section,  the  Hilbert  was 
used  for  reasons  given  previously.  The  choice  of  b  was  made 
so  as  to  give  the  system-of -equat ion  solver  the  worst 
possible  situation. 
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For  this  reason,  we  chose. 


where  the  length  of  b  depended  on  the  order  of  the  system. 

By  choosing  this  particular  vector,  the  solution,  x,  will  be 
the  first  column  of  the  inverse  of  A.  This  is  one  of  the 
hardest  cases  to  solve  since  a  system-of -equat ion  solver 
typically  does  not  solve  for  the  inverse,  and  this  b  will 
force  it  to  do  so.  Also,  the  norm  of  x  will  be  quite  large, 
as  shown  later.  Since  the  first  column  of  the  inverse  a  A 
will  be  needed,  the  choice  of  the  Hilbert  matrix  is 
convenient  because  of  the  ease  of  computing  the  inverse 
Hilbert.  Physically,  the  6  chosen  could  represent  a  unit 
impulse  applied  to  only  one  input. 

The  program  flow  can  be  described  heuristically : 

1) .  Obtain  A  and  A*1  (A'1  used  for 

computation  of  condtA)) 

2)  .  Set  6=0  and  6(1)  =  1. 

3) .  Solve  Ax  =  6  and  save  x 


IS 


4). 


Apply  a  mask  to  A  and  solve  Alx  =  b, 
using  the  interative  procedure,  Alx1+^ 

=  b  -  A2X1,  stopping  when  convergence 
is  satisfied. 

5) .  Compare  results  obtained  from  5)  and  4J 

6) a.  Apply  different  mask  and  repeat  4)  and  5) 

b.  if  all  masks  have  been  applied,  increment 

order  of  matrix  and  go  to  1). 

Step  1)  is  easily  implemented  by  a  call  to  a 

subroutine  and  giving  as  input  the  order  desired.  Step  2) 

is  straightforward.  Step  3)  is  merely  a  call  to  a 

system-or-equation  solver,  and  inputting  A  and  b,  along  with 

appropriate  work  matrices  and  parameters.  The  primary 

solver  used  was  LEQT2F.  LEQT2F  applies  iterative 

improvement  internally.  SGELM  was  used  intermittently. 

SGEIM  also  uses  internal  iterative  improvement. 

Step  4}  can  best  be  described  with  a  "pseudo" 

flowchart : 
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Apply  mask  to  A  and 
obtain  A1 


Send  Al  and  b  to  solver. 
Set  x  =  x1,  where  i  =  0 


xi+1  =  b  -  A2xx 


Using  Alx 


obtain  b  ,  where 
b '  =  b  -  A2X1 


Send  Al 

and  t 

* 

to  solver, 

check  | 

>;i+1 

-  xMl 

Is  ||*i+1  -  X1!!  ■  0.' 


Step  5) 

The  norm  used  for  convergence  checks  was  simply, 

I  lx-1*1  -  i‘|l  ■  j  Ixf1  -  xf| 

3  31  J  J 

The  comparison  in  step  5)  was  performed  using  residuals 
i .  e .  , 


error 


|  Ax  -  b I 


INI 

The  error  with  masking  used  the  x  obtained  from  step 
3j,  and  the  error  with  masking  used  the  x  obtained  from  step 
4) . 

Step  6)a  simply  incremented  the  mask  applied.  The 
masks  used  were,  in  order, 

■Og  ,  - 1  g »  ^  8  ’  '  ^  8  ’  ”^^8*  ^  8  ’ 

-777g ,  -7777g ,  -77777g,  -777777 g,  -7777777g. 

These  correspond  to  the  number  of  bits  set  equal  to  zero, 
respectively, 

0  bits,  1  bits,  2  bits,  3  bits,  4  bits,  5  bits,  6 

bits,  9  bits,  12  bits,  15  bits,  18  bits,  21 

bits . 

After  each  incrementation  of  the  mask,  the  flow  proceeded  to 
step  4),  since  solving  the  original  system,  Ax  =  b,  would 
give  identical  results. 

After  the  last  mask  had  been  applied,  the  order  of 
the  matrix  was  incremented  in  step  6)b,  and  the  program 
began  a  new  at  step  l).  The  program  was  stopped  at  an  order 
of  12,  since  LEQT2F  and  SGEIM  failed  to  converge  while 
solving  Ax  =  b.  Although  error  estimates  could  have  been 
obtained,  the  results  might  possibly  be  faulty  and 
misleading . 

The  error  in  the  iterative  loop,  or  the  convergence 
criteria  in  step  4)  was  forced  to  equal  zero.  In  other 
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words,  the  elements  in  the  final  solution  vector  were  equal, 
on  a  bit-by-bit  basis,  to  the  elements  in  the  preceding 
solution  vector.  This  constraint  presented  no  problems  in 
convergence.  However,  several  times  the  solution  failed  to 
converge  after  30  iterations.  Convergence  could  only  be 
obtained  by  letting  the  tolerance  assume  unacceptable  values- 
As  a  comparison,  the  unmasked  Hilbert  and  a  b  of  all 
zeroes,  with  b(l)  =  1  was  used: 


ORDER 

Residua  1 

RESIDUAL  ERROR 

(.  LEQTlF ) 

Table  VI 

errors  for  a  given  5 

RESIDUAL  ERROR 

(LEQT2F) 

RESIDUAL  ERROR 

(SGEIM) 

4 

. 7S351E-12 

.69972E-13 

, 69972E-15 

5 

. 37334E-11 

.17584E-11 

.17584E-11 

6 

. 1714  IE-10 

. 39022E-10 

.39022E'll 

7 

.82591E-10 

•67809E-10 

.67809E-10 

8 

. 79065E-09 

. 70000E-09 

. 70000E-09 

9 

. 55734E-08 

. 213D1E-08 

.  21301E-08 

10 

. 51631E-07 

•36709E-07 

.36709E-07 

11 

. 16794E-06 

.90699E-07 

.  90699E-07 

LEQT1F  had  bigger  errors  due  to  lack-  of  iterative 
improvement. . 
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Now  masking  was  implemented,  using  the  same  b: 


Table  VII 


Residual  errors 

with  masking  and  a 

given  5 

ORDER 

RESIDUAL  ERROR 

RESIDUAL  ERROR 

MASK 

(LEQT1F) 

( LEQT2F) 

4 

. 69972E-13 

.69972E-13 

-ig 

5 

.87SS5E-12 

.  8755SE-12 

-7777777g 

6 

•33090E-11 

.  33090E- 1 1 

-7777777g 

7 

. 40057E- 10 

•40057E-10 

-7777g 

8 

. 48427E-09 

•48427E-09 

-7777g 

9 

. 144S3E-08 

.  14453E-08 

-7777g 

10 

. 28956E-07 

.28956E-07 

~778 

11 

•90699E-07 

. 90699E-07 

-8 

Notice  that  masking  improved  the  error  for  almost  every 
order  chosen. 

The  solution  vector  for  this  section  is  simply  the 
first  column  of  the  inverse  Hilbert.  The  solution  vector 


for  order  10  is: 

r 


X  -  * 


100 

-4950 

79200 

-600600 

2522520 

-6306300 

9609600 

-8751600 

4375800 

-923780 
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a 


The  norm  of  this  vector  is  very  large.  However,  the  large 
norm  of  x  motivated  the  use  of  a  multiplier,  namely, 

V 

0.0 
0.0 
0.0 


0.0 

By  choosing  V  to  be,  say,  10  the  solution  vector  will 
be  smaller,  thereby  bringing  the  solution  vector  closer  to  a 
smaller  norm.  Now  the  method  involved  choosing  an  order  and 
a  multiplier  and  cycling  through  all  of  the  masks.  Then, 
the  multiplier  was  incremented  and  the  process  repeated. 

_  O 

The  range  of  the  multipliers  were  10  through  1.  The 
following  table  gives  the  best  error  obtained  for  a  given 
order.  For  comparison  purposes,  some  of  the  previous 
results  are  given: 
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Table  VIII 


Residual  errors  with  masking 
ORDER  RESIDUAL  ERROR  RESIDUAL  ERROR 


and  multiplier  with  a 


(LEQTlF) 
(no  masking) 


. 69972E-13 
.17S84E-11 
.39022E-10 
. 67809E-10 
. 70000E-09 
•21301E-08 
.3P709E-07 
.  90699E-07 


(LEQT2F) 
(masking  only) 
. 69972E-13 
.  87555E-12 
.  33090E- 1 1 
.40057E-10 
.  48427E-09 
.  14453E-08 
. 28956E-07 
.  90699E-07 


RESIDUAL  ERROR 
(LEQT2F) 
(mask  8  mult) 

.  968S7E-14 
.  18736E-12 
.  35767E-12 


.24875E-1C 
. 89556E-10 
. 85877E-09 
. 43302E-08 


MASK 


-7777777, 


-7777777, 


.49903:5-11  -7777777, 


-1111 { 
-1111 , 


MULT 


While  masking  gives  improved  results,  the  results  using  a 

multiplier  are  even  better. 

As  explained  in  Appendix  A,  convergence  is  satisfied 

if  | | A1 _1A2 | | <1 .  A  check  was  made  for  all  masks  and 

convergence  was  satisfied. 

A  different  A  was  now  used,  obtained  from  ABMAT,  as 

-13 

previously  discussed.  An  N  of  N  =  16  and  a  P  =  .5  +  10 
was  chosen.  The  following  table  gives  results  for  no 
masking,  masking,  and  masking  with  a  multiplier: 
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Table  IX 


ORDER 


4 

5 

6 

7 

8 


Residual  errors  with  masking 

and  multiplier 

with  a  given 

_5 

RESIDUAL  ERROR 

(LEQT1F) 

(no  masking) 

(matrix  from  . 

RESIDUAL  ERROR 

(LEQT2F) 
(masking  only) 

ABMAT  was  used) 

RESIDUAL  ERROR 

(LEQT2F) 
(mask  8  mult) 

MASK 

MULT 

•37255E-08 

.37255E-08 

. 37255E-08 

-7777777g 

1.0 

. 61402E -04 

. 52083E-04 

. 11945E-05 

-38 

10'6 

. 10688E-01 

. 10688E-01 

. 831 89E-04 

-178 

10'4 

. 96037E+00 

. 59864E+00 

. 90140E-01 

~178 

10‘2 

.4617  8E+02 

.4617  8E+02 

. 15700E+02 

10"8 

The  results  stopped  at  order  8  because  LEQT2F  failed  to 
converge  for  higher  orders.  These  results  have  been 
verified  by  W.  Ericksen. 

CPU  times  have  not  been  given  due  to  the  amount  of 
extraneous  computation. 


VI.  Recommendations 

All  of  the  results  were  obtained  with  a  60-bit 
machine.  Other  machines  should  be  used.  Possible,  the 
algorithm  devised  by  Brent  [2],  could  be  implemented,  since 
it  can  simulate  any  wordlength  desired.  Also,  the  results 
obtained  in  Va  should  be  investigated  more  thoroughly, 
perhaps  by  trying  different  x  vectors  and  using  other 
matrices . 
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VII.  Conclusion 


The  results  obtained  show  masking  to  be  quite  useful 
in  ill-conditioned  systems  when  an  A  and  b  are  specified, 
and  x  not  known.  However,  for  some  cases,  improvement  was 
negligible.  Masking  is  easily  implemented,  and  can  provide 
good  results  for  a  minimum  increase  in  computer  time. 

The  results  of  Va  are  quite  spectacular,  and  while 
not  completely  realistic,  they  are  important  enough  to 
warrant  further  investigation.  They  show  that  for  small 
changes  in  A  and  b,  convergence  is  obtained  where  it  was 
previously  impossible. 

"As  yet  a  child,  nor  yet  a  Fool  to  Fame, 

I  lisp'd  in  numbers,  for  the  numbers  came" 


Alexander  Pope 
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Appendix  A 

Let  the  system  be  defined  as 
Ax  =  b 

and  let 

A  =  A1  +  A2 


where 


A1  =  A(masked) 

Now  the  system  becomes 
(A1  +  A2)x  =  b 
Alx  +  A2x  =  6 
Alx  =  b  -  A2x 

Since  A2  has  only  small  elements,  our  iteration  can  proceed 
as 

Alx1+1  *  b  -  A2x 1 

where  is  obtained  by  solving  Alx  *  b 

For  convergence, 

Al(x-x1)  *  -A2(x-xq) 

(x-Xj)  +  -AI'^AZIx-Xq) 

A1(x-x2)  "-AZtx-Xj)  *  A2A1"1A2(x-xq) 

(x-x2)  *  -  A1 "  ^  A2  (.  x-x^  ) 

Al(x-x-)  ■  -A2(X'X23.«  (-A2)(-Al"1(x-x1)) 

-  (-A2)l-Al*1A2)(-Al1) 
(x‘-x’0) 

=  ( -A2 ) ( -Al *1A2)2 
(x’-x'q) 

(x-x5)  *  C-Al’1 )5Cx-Xq) 
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so, 

x-xn  *  C-Al'1A2)n(x-x0) 

j|x-x”{{  <  ll-Al'^ll  n  ||*-*oll 
j  j  -  A.1 " 1 A2  |  | n  <  I- 
- | 1 Al"1A2| |n  <  I- 

since  ilAl'1A2||  is  always  greater  than  zero 
llAl'1A2ll  <  1.  for  convergence. 

These  derivations  are  due  to  W.  Ericksen 
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